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ABSTRACT 

As the number of observed Gamma-Ray Bursts (GRBs) continues to grow, 
follow-up resources need to be used more efficiently in order to maximize science 
output from limited telescope time. As such, it is becoming increasingly impor- 
tant to rapidly identify bursts of interest as soon as possible after the event, before 
the afterglows fade beyond detectability. Studying the most distant (highest red- 
shift) events, for instance, remains a primary goal for many in the field. Here we 
present our Random forest Automated Triage Estimator for GRB redshifts (RATE 
GRB-z) for rapid identification of high-redshift candidates using early-time met- 
rics from the three telescopes onboard Swift. While the basic RATE methodology 
is generalizable to a number of resource allocation problems, here we demon- 
strate its utility for telescope-constrained follow-up efforts with the primary goal 
to identify and study high-2; GRBs. For each new GRB, RATE GRB-2 provides a 
recommendation — based on the available telescope time — of whether the event 
warrants additional follow-up resources. We train RATE GRB-z using a set con- 
sisting of 135 Swift bursts with known redshifts, only 18 of which are z > 4. 
Cross-validated performance metrics on this training data suggest that ~56% 
of high- 2; bursts can be captured from following up the top 20% of the ranked 
candidates, and ~84% of high- 2; bursts are identified after following up the top 
~40% of candidates. We further use the method to rank 200-1- Swift bursts with 
unknown redshifts according to their likelihood of being high- 2;. 

Subject headings: Gamma-ray burst: general - Methods: data analysis - Meth- 
ods: statistical 
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Introduction 



As the most luminous electromagnetic explosions, gamma-ray bursts (GRBs) offer a 
unique probe into the distant universe — but only if their rapidly fading afterglows are ob- 



served before dirnming beyond detec t ability (e.g.. IWijers et all Il998l : iMiralda-Escudd Il998 



Lamb fc ReichartI boool : iKawail boosl iMcQuinn et all boOsh Tsince the launch of the Swift 



satellite in November 2004 ( iGehrels et al.ll2004l ). more than 170 long duration Swift gamma- 
ray bursts have had measured redshifts, but only a handful fall into the highest redshift 
range that allow for the probing of the earliest ages of the universe, up to less than a billion 
years after the Big Bang (Fig. [T]). With a limited budget of large-aperture telescope time 
accessible for deep follow-up, it is becoming increasingly important to rapidly identify these 
GRBs of interest in order to capture the most interesting events without spending available 
resources on more mundane events. 



(e.g., 



Bouwens et al. 2010 



Along with quasars (e . g.. iMortlock et al.ll201l[ ) and NIR-dropout lyman-break galaxies 



20111 ). GRBs have been established as among the most distant 
objects detectable in the universe, with a spectros copically confirmed event at z = 8.2 (GRB 



090423; iTanvir et al. 



2009 



Salvaterra et al.ll2009l ) and a photometric candidate at 2 ~ 9.4 



(GRB 090429B: ICucchiara et al.ll2011bl ). Such observations can provide valuable constraints 
on star formation in the early universe, illuminate the locations and pro perties of some of the 



(e.g. 



Tanvir fc Jakobsson 



earlie st galaxies and stars, and probe the epoch of reionization. 
20071 . and references therein). Further, the relatively simple spectra of GRB afterglows 
compared to other cosmic lighthouses makes it easier to both identify their redshifts and 
extract useful spectral featu res such as neutral hydrogen absorption signatures for the study 



of cosmic reionization . (e.g.. lMiralda-Escudelll998l : lBarkana fc Loebll2004l : iTotani et al.ll2006 



McQuinn et al.l 120081 ). However, such benefits can only be realized if spectra are obtained 
with large-aperture telescopes before the afterglow fades beyond the level required to obtain 
a useful signal, typically within a day after the GRB. 

As such, there has been a long-standing effort to extract a measure of a GRB's redshift 
from its early time, high-energy signal, with a primary goal of the rapid identification of high- 
z candidates. This might appear in principle to be a straightforward exercise; for instance, 
distant GRBs should on average appear fainter and longer-duration than nearby events due 
to distance and cosmological time dilation, respectively. In practice, however, the large 
intrinsic diversity of GRBs, as well as thresholding effects, confounds the straightforward 
use of early-time observations in divulging redshift and other important properties. While 
much effort has gone into tightening the correlations between high-energy properties in order 
to ho mogenize the sarnple for use as a luminosity (and herice di s tance/redshift ) predictor 
(e.g.. lAmati et al.lbood : iGhirlanda et al.l2004i Firmani et aPbood : [Scliaeferlbo07h . there has 
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been significant debate as to whether some of these relations are actually due to thresholding 
effects specific to the detectors rather than intrin sic physical properties of the GRBs (e.g., 
Friedman fc Bloomll2005l : iButler et al.ll2007ll2009l ). Regardless, whether or not these inferred 
relationships are actually physical or simply detector effects would not affect their utility as a 
detector- specific parameter prediction tool. By restricting ourselves to Swift events only, we 
avoid the uncertainty of whether certain correlations remain when using different detectors. 

With this in mind, we set out to search for indications of high-redshift GRBs in the 
rich, mostly homogeneous datas et provided by 6+ year s of O RB observatioris by t he three 
telescopes onboard Swzft (BAT; iBarthelmv et al.lboosl . XRT; burrows et ahlbood . UVOT; 
Roming et al.ll2005l ). Past studies exploring high- 2; indicators have used hard cuts on certain 
features such as UVOT afterglow detection, burst duration, and inferred hydro gen column 



density (e.g., iGrupe et al 



on such features ( 



Koen 



tors flXiao &: Schaefei 



2007 



2009 



20091. 



vanden Berk et al.l l2008l : lUkwatta et al.l l2009l ) , regression 



2010h . and combinations of potential GRB luminosity indica- 



2OIII ). In this work, we take a different approach by utilizing 



supervised machine learning algorithms, specifically Random Forest classification, to make 
follow-up recommendations for each event automatically and in real time. Particular atten- 
tion is paid to careful treatment of performance evaluation by using cross-validation (§!]), 
a robust methodology to guard against over-fitting and the circular practice of testing hy- 
potheses using the same data that suggested (and constrained) them. 

The primary driving force of this study is simple: given limited follow-up time available 
on telescopes, we want to maximize the time spent on high-z GRB& To this end, we provide 
a deliverable metric, explained in §3.2[ to assist in the decision making process on whether 
to follow up a new GRB. Real-time distribution of this metric is available for each new Swift 
trigger via websit€@ and RSS feecfl. 

The structure of this paper is as follows: in §2] we outline the collation of the data, and 
describe the particular GRB features utilized in redshift classification. In ^ the Random 
Forest algorithm is detailed, along with some specific challenges posed by this particular 
data set. Performance metrics of the classifiers are presented in §U and in ^we discuss the 
results of testing the classifiers on additional GRBs, both with and without known redshifts. 
Finally, our conclusions are given in ^ 



^For the purposes of this study, "high-redshift" corresponds to all z > 4.0: a compromise between 
only keeping the most interesting events and having enough data to train on. However, we have explored 
performance of different redshift cuts; see ^4.31 



^|http : //rate . grbz . info/ 



http : //rate . grbz . inf o/rss . xml 
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Time since Big Bang (Gyr) 




Redshift (z) 

Fig. 1. — Redshift distribution of the 135 long-duration Swift GRBs in our sample (Table |2]). 
For the purposes of this study, "high" redshift is defined as those bursts with redshifts larger 
than z = 4, which corresponds to approximately 1-a above the mean of the distribution. In 
our sample, 18 bursts fall into this category (black, and in inset). In determining age since 
the Big Bang, we assume a cosmology with h = 0.71, Qm = 0.3, and Q\ = 0.7. Solid lines 
show the cumulative number of GRBs as a function of redshift for high- 2; bursts (grey) and 
all bursts (black). 

2. Data Collection 

The Swift BAT constantly monitors 1.4 steradians on the sky over the energy range 15 — 
150 keV. GRB triggering can occur either by a detection of a large gamma-ray rate increase 
in the BAT detectors ("rate trigger"), or a fainter, long-duration event recovered after on- 
board source reconstruction reveals a new significant source ("image trigger"). A rough (~ 
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3 arcmin) position is determined, and if there are no overriding observing constraints, the 
spacecraft slews to allow the XRT and UVOT to begin observations, typically between 1 and 
2 minutes after the trigger. The XRT observes between the energy range of 0.2 — 10 keV and 
detects nearly all of the GRBs it can observe rapidly enough, providing positional accuracies 
of 2 — 5 arcseconds within minutes. The UVOT is a 30cm aperture telescope that can observe 
in the range of 170 — 650 nm. Due to the relatively blue response of this telescope, it cannot 
detect highly reddened sources due to either dusty environments or (more relevant to this 
analysis) high-redshift origins. 

At each stage in the data collection process, information is sent to astronomers on 
the ground via the Gamma-ray bursts Coordinates Network (GCN0) providing rapid early- 
time metrics. The more detailed full data are sent to the ground in ~ 90 minute intervals 
starting between roughly 1 — 2 hours after the burst. For our dataset, we have collected data 
after vari ous levels of processing d i rectly from GCN notices, online tabled and automated 
pipelines ( iButler fc Kocevskil 120071 : iButler et al.l 120071 ) that process and refine the data into 
more useful metrics. Tens of attributes and their estimated uncertainties (when available) 
are parsed from the various sources and collated into a common format. 

In order to evaluate our full dataset in an unbiased way, we restricted ourselves to 
using features which have been generated for all possiblefl past events and are automati- 
cally generated for future events. This is the prii nary reason we do not incl ude potentially 



useful features such as relative spectral lag (e.g., lUkwatta et al.ll2010l . l201ll . and references 
therein) which has been utilized as a red s hift indicator with smaller and w e- Swift datasets 
(IMurakami et al.l l2003l : iBand et al.l 120041 : IZhang et al.l l2006l : ISchaeferl 120071 1 but requires a 
larger spectral coverage than Swift alone can provide. However, our technique is easily ex- 
tendable to include additional useful features should they be homogeneously determined for 
past GRBs and automatically available in real-time for new events, and therefore we strongly 
encourage the automated distribution of any such data products. 

Because the addition of too many features causes a decrease in classifier performance 
(see §4.2p . a total of 12 features were kept for our final classifier (Table [1]), 10 of which were 
derived from BAT gamma-ray measurements, one from XRT observations, and one from 
UVOT observations. Of the 10 BAT features, 4 were parsed directly from GCN Notices, the 



' http : //gen ■ gsf c . nasa . gov/| 

' jhttp : //swift . gsf c .nasa. gov/docs/swif t/ELrchive/grb_table .html/ 

^Evcn with the restriction of observation by all 3 Swift telescopes, certain features derived from model 
fits are nonetheless incalculable for certain GRBs from the available data. See fj3TTTT]for how our algorithm 
treats missing values. 
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most rapidly available (and thus unrefined) source of information on GRB^. The parameter 
tsAT is a rough measurement of the duration of the BAT trigger event and thus a lower 
limit on the total duration of the GRB. The binary feature of whether or not the event was 
a rate trigger is an indicator of the signal-to-noise of an event, for only the brighter events 
are detected as rate triggers, while those on the threshold of detection are image triggers. 
The final two GCN features are also rough indicators of brightness: cbat is the significance 
(in sigma) of the detected source in the on-board reconstruction of the BAT image, and 
Rpeak,BAT is the peak count rate observed during the duration of the event. 

Five higher-level BAT-derived att ributes were pulled from online tables automatically 
updated by the pipeline described in iButler et al.l ( 120071 ). The feature a is the power- 
law index before the peak of the Band-function fit to the gamma-ray spectrum (typically 
clustered around —1). Another parameter in the Band-function fit, -Epcak? is the energy 
at which most of the photons are emitted. The fluence, S*, is the total gamma-ray flux 
(15-350 keV) integrated over the duration of the burst. S'/A'max is simply the maximum 
signal-to-noise achieved over the duration of the light curve. Finally, Tgo is a measure of 
the burst duration, defined to be the time interval over which the middle 90% of the total 
background-subtracted flux is emitted. 

One additional "metafeature" is derived from the BAT data. In principle, if we knew 
in detail the intri nsic distributions of GRB observables (fluence, hardness, duration; see 
Butler et al.l 120071 ) as a function of redshift, measurements of these observables for a new 
event could be used to directly evaluate th e expected redshift. A detailed fitting of the intrin- 
sic distributions for Swift is presented in iButler et al.l ( l2010l ). and we use the parametrized 
intrinsic distributions there to calculate the posterior probabili ty redshift distributions for 
each GRB in our sample (see, e.g.. Figure 8 in iButler et al.ll2010l ). Here, we further condense 
this distribution into one useful feature: Pz>a^ the fraction of posterior probability at z > 4. 

Finally, two features are extracted from data taken by the two narrow-field instruments 
onboard Swift, one each from the XRT and UVOT. The feature A^H,pc is the excess neutral 
hydrogen column (above the galactic value) inferred fr om the XRT PC (Photon-counting 
mode) data, obtained from the iButler fc Kocevskil fl2007l ) pipeline. The last feature is simply 
a binary measure of whether or not the GRB afterglow was detected by the UVOT. 



While most of these features have associated uncertainties, the proper treatment of un- 



"^For 14 events in our test set, the SWIFT_BAT_POSITION notice was not available on the 
online repository, primarily due to satellite downlink problems at the time of discovery. For 
these events, the relevant parameters were extracted directly from the Swift TDRSS database 
( http: //heasarc .nasa.gov/W3Browse/all/swifttdrss .html! ) ■ 
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certa inties in attributes is an area of ongoing research in machine learning (e.g. ICarroll et aL 



20061 ). Some methods call for the uncertainties to be treated as attributes in and of them- 
selves, but we found that the addition of these relatively weak features were actually detri- 
mental for our small dataset (see, e.g.. Fig. E]). We also considered an approach by which fea- 
tures with large uncertainties were considered poor measurements and were instead marked 
as missing values. However, this had a negligible effect on our final classifier performance, 
so for simplicity we treat all values as precisely known. 

We collated data on all Swift GRBs with rapidly available BAT data up to and including 
GRB 100621A - 471 in total. Specifically, this excludes bursts which were not identified in 
real-time due to the event being below the standard triggering threshold or occurring while 
the satellite was slewing to a new location. Of these, 39 are short GRBs (defined for the 
purposes of this study to be those with Tgo < 2.0 which are believed to arise from a 
different physical process and are thus removed from the sample. For further uniformity 
in the sample, bursts without rapid (< 1 hour) XRT/UVOT follow-up are also removed, 
leaving 347 event^. Of the remaining long bursts in our sample, 135 had reliable redshifts 
(Table [2]) and were thus included in our training data set (Table E]). The additional 212 
long bursts without secure redshift determinations are explored further in §5.11 Exploratory 
data analysis shows preliminary indications of which of these features will be most useful for 
classification. Figure |2] shows several 2D slices of the feature space, with the high- 2; bursts 
highlighted. 



^ Tgo alone is not a strong enou gh discriminator to de finitively assign a particular GRB to one class or 
another ( "short" versus "long" ; see iLevesque et al.l 12010 for discussion) . In this study, we will accept the 
few errant bursts from the "short" class included in our sample as additional noise in our method. 

^The reason for this missing data is almost always due to observing constraints from the GRB being too 
close to the Sun, Moon, or Earth at the time of discovery. Not removing these bursts would introduce a 
bias in the sample due to the fact that events without a rapid XRT position are far less likely to lead to 
an afterglow discovery, and hence, redshift determination. A total of 15 bursts with known-z were removed 
because of this. 
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Table 1. List of Features Utilized 



Feature 


Type 


Reference 


BAT Rate Trigger? 


BAT Prompt 


GCN Notices 




BAT Prompt 


GCN Notices 


Rpeak,BAT 


BAT Prompt 


GCN Notices 


tsAT 


BAT Prompt 


GCN Notices 


UVOT Detection? 


NFI Prompt 


GCN Notices 


^H,pc 


Processed 


Butler & Kocevski (2007) 


a 


Processed 


Butler et al. (2007) 


-^pcak 


Processed 


Butler et al. (20071 


s 


Processed 


Butler et al. (2007) 




Processed 


Butler et al. (2007) 




Processed 


Butler et al. (2007) 


Pz>4 


Processed 


Butler et al. (2010) 



Table 2. Training Data Redshifts 



GRB 


Qtrain 


z 


References 


050223 


4.30e-01 


0.5915 


Bereer & Shin 2006 


050315 


3.57e-01 


1.949 


Kelson & Bereer 2005 


050318 


6.86e-01 


1.44 


Berser & Mulchaev 2005 


050319 


5.90e-01 


3.2425 


Fvnbo et al. 2005a; Jakobsson et al. 2006c: Fvnbo et al. 2009b 


050416A 


7.68e-01 


0.6535 


Cenko et al. 2005 



Note. — Table[2lis published in its entirety in the electronic edition of The Astrophysical Journal. 
A portion is shown here for guidance regarding its form and content. 



Table 3. Training Data 



GRB 


a 


^peak 


S 


S / N-rnax 


Nh,pc 




0"SAT 


Rpeak.BAT 


Rate 


tBAT 


UVOT 


Pz>A 






(keV) 


(erg/cm^) 




(1022 cm-2) 


(s) 




(ct/s) 


trigger 


(s) 


detect 




050223 


-1.74e+00 


6.70e+01 


8.75e-07 


1.34C+01 


-2.37e-01 


1.74e+01 


9.00e+00 


7.26e+02 


yes 


8.19e+00 


no 


1.74e-01 


050315 


? 


4.33e+01 


4.32e-06 


4.37e+01 


9.60e-02 


9.46e+01 


8.00e+00 


2.60e+02 


yes 


1.02e+00 


no 


9.27e-02 


050318 


-1.22e+00 


5.01e+01 


1.41e-06 


4.90e+01 


1.80e-02 


3.10e+01 


9.00e+00 


2.05e+02 


yes 


5.12e-01 


yes 


6.29e-02 


050319 


-2.00e+00 


4.47e+01 


1.87C-06 


1.82C+01 


1.50C-02 


1.54e+02 


l.OOe+01 


2.63e+02 


yes 


1.02e+00 


yes 


1.48e-01 


050416A 


-7.24e-01 


1.50e+01 


3.40C-07 


1.75C+01 


2.34C-01 


2.91e+00 


l.lOc+01 


1.65C+02 


yes 


5.12e-01 


yes 


4.35e-03 



Note. — Table [3] is published in its entirety in the electronic edition of The Astrophysical Journal. A portion is shown here for guidance regarding its form and 
content. 
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Fig. 2. — Plot of a selection of early-time Swift features (Table [T]) against each other. The 
grey points show the full distribution of Swift GRBs. Bursts with known redshifts are 
black, and the 18 known events with redshifts greater than 4 are overplotted in red. In 
the histogram text boxes, shows how many instances of that feature in total are shown 
(anything less than the full number of instances is due to the value of that feature being 
unknown for certain instances), and Max shows the maximum number of instances in any 
particular bin. 

3. Classification Methodology 

The resource allocation approach we have taken here naturally manifests itself as a 
classification problem: deciding whether or not to follow up a new event is simply a two- 
class problem of "observe" or "do not observe," and the methodology presented here can be 
applied to any problem that can be broken up in this way. This was the primary motivation of 
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using classification instead of a regression or "pseudo-z" approacli for tliis study. Tlie primary 
disadvantage of classification for the particular problem of high-redshift identification is that 
all instances above and below the class division (chosen here to be 2; = 4) are treated equally; 
e.g., a burst with z = 4.01 has the same influence on our inference about "high" bursts as 
a burst with z = However, classiflcation has advantages over regression in that it is 
a conceptually much simpler problem, and most of the difficulties encountered due to the 
unbalanced, small dataset of interest here would only be aggravated by an extension to 
regression. Further, our approach capitalizes on the fact that one of our predictors (lack 
of UVOT detection) is itself a binary feature with an understood physical connection to 
redshift0. 



3.1. Random Forest classification 



A supervised classification algorithm uses a set of training data of known class to esti- 
mate a function for assigning data points to classes based on their features. The statistics and 
machine learning communities have developed many classification algorithms, including Sup- 
port Vector Machines (SVM ), Naive Bayes, Neural Networks, and Gaussian Mixture Models. 
We use Random Forest (RF lBreimanll200ll ) for its ability to select important features, resist 



overfitting the data, model nonlinear relationships, handle categorical variables, and produce 
probabilistic output. These strengths, along with a record of attaining very high classifica- 
tion accuracy relative to other a lgorithms have led to widespread use of Random Forest in 
the astronomy community (e.g., iBailev et al. 2007 : Carliles et al. 2010; Dubath et al. 2011 



O'Keefe et al.ll2009l : [Richards et al.ll201ll ). In this work, we utilized custom R software built 
around the randomForest package to generate classifiers and evaluate performance. 

Random Forest is an ensemble classifier that averages together the outputs from many 
decision trees, a common example of which is Classification and Regression Trees (CART, 
Breimanlll984j ). In RF, the decision trees are constructed by recursive binary splitting of the 
high-dimensional feature space, where each split is performed with respect to a particular 
feature. For example, the decision tree might split the data on feature S'/A^'^ax using value 



^'^This of course would not be an issue when applying the RATE methodology to a problem with more 
well-defined class boundaries, such as prioritizing follow-up of a particular rare class of transient event. 

Bursts with a UVOT detection must he z < 5 due to the Lyman cutoff. This is due to the fact that 
photons with wavelengths smaller (thus higher energy) than the Lyman limit of A = 912A would be almost 
completely absorbed by neutral gas in the host galaxy and intergalactic star forming regions. A redshift of 
z = 5 might therefore be considered a natural cutoff point for the high-z class, but due to so few training 
events at this high redshift {N^y^ — 8), we opted for the more conservative cutoff point of z = 4 (A^2>4 = 18). 
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100, in which case all observations with S/N^jj^^x > 100 are placed in one group and the rest 
placed in the second group. As these are binary splits, for convenience we henceforth refer 
to observations going "left" or "right" of each split as an analogue for the decision made at 
that split. 

For each split, the feature and specific split-point are chosen so as to best separate the 
observations into the classes, by using some objective funct ion. We use the Gini Index, a 



standard objective function for classification ( lBreimanlll984l ). At any given node in a tree 



and some proposed split s, let N^h = number of high-priority (in our case, high- 2;) events that 
go to the left of the split, Ni^i = number of low-priority events that go left. Define Nr^h and 
Nr^i similarly, replacing left with right. Let Ni = N^ + N^h, the total number of observations 
that go left. Similarly define, Nr = N^^i + N^^h, for the total number of observations that go 
right. The Gini criterion is defined as 



Ni+ Nr \ Ni J \ Ni J Ni + Nr \Nr J \N, 

and the split that minimizes this value over the random subset of features considered at 
each noda^i is chosen. For instance, in the ideal case where the split on a particular feature 
completely separates all the instances of the two classes from each other, the Gini index 
reaches a minimum of 0. The splitting is done recursively, continuing down each subgroup 
until all of the observations in each final group ( "terminal node" ) are of a single class. The 
process is known as "growing a tree" because each split can be visualized as generating two 
branches from a single branch to produce a tree-like structure. Once a tree is constructed 
from the training data, each new observation starts at the root node (the top split in the tree) 
and, recursively, the splitting rules determine the terminal node to which the observation 
belongs. The observation is assigned to the class of the terminal node. 

To create the RF classifier, a sufficiently larg number of decision trees are constructed, 
resulting in a "forest" . Each decision tree is generated from an independent bootstrap sample 



(jEfronlll982l ): Samples are drawn with replacement from the original data set, resulting in a 
new data set of the same size as the original, with on average 2/3 of the original observations 
present at least once. Additionally, only a random subset of the features is eligible for 
splitting at each node. Many decision trees are grown with each tree slightly different due 



^^At each node, m = 3 features were considered, guided by the defauh practice in the randomForest 
routine of m = floor(Y^), where p is the total number of features. 

^■^With enough trees, error rates will converge and growing additional trees will result in no further 
performance improvements. Our forests are grown to 5000 trees throughout this work in order to ensure 
consistency in the rankings of unknown events. 



- 13 - 



to the bootstrap sampling and random selection of features at each split. RF classifies new 
observations by averaging the outputs of each tree in the ensemble. 

Training observations can be classified by using all trees where that observation was 
not used in the bootstrap sampling stage. This produces estimates of error rates and class 
probabilities for each observation that are not overfit to the training data. Error rates and 
probabilities computed using this method are known as "out-of-bag" estimates. 



3.1.1. Missing feature values 

As mentioned in ^ certain features, namely a and A'^n.pc, were occasionally unable to 
be determined from model fits to the data and are thus missing for certain observations. We 
handle missing values by imputation, where missing values for features are assigned estimated 
values. For missing values of continuous features, we assigned the median of all observations 
for which that feature is non-missing. Missing categorical features are assigned the mode of 
all observations for which the feature is non-missing. This is one of the simplest imputation 
methods and has the advantage of being transparent and computationally cheap. We experi- 
mented with a more sophisticated imputation method, MissF orest, that iteratively predict s 



the missing values of each feature given all the other features (jStekhoven &: Biihlmannll201ll ). 
but as it produced similar error rates to median imputation, we opted for latter, simpler ap- 
proach in our final classifier. 



3.1.2. Class imbalance 



A further challenge in this data set is the imbalance between classes. We are training 
on 135 bursts, only 18 of which are in the high- 2; class — an asymmetry present in many 
resource allocation problems where the goal is to prioritize the rarer events. Without modi- 
fication, standard machine learning classifica tion algorithrns app lied to imbalanced data sets 
attain notoriously suboptimal performance (jChawla et al.ll2004l ). and often result in simply 
classifying all unknown events as the more common class. As we care more about cor- 
rectly classifying the rarer events, misclassifications of high-z events must be punished more 
strongly than vice versa. In Random Forest, classes may be weighted in order to overcome 
the imbalance by altering the split s chosen by Gini a nd the probabilities assigned to classes 
in the terminal nodes of each tree fjChen et al.ll2004l ). 



We utilized the classwt option in the randomForest package, which accounts for class 
weights in the Gini index calculation (Eq. [1]) when splitting at the nodes (Liaw 2011, private 
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communication), similar to weighting techniques used in single CART trees f lBreimanlll984l ). 
If we are weighting high-priority observations (e.g. z > 4 GRBs) by Wh and low-priority 
observations by wi, we let, 

















= WlNr,l 



Let A^";' = N'li + N'if^, the weighted total number of observations that go left. Similarly define, 
NI = NI^i + iV^/j, for the weighted total number of observations that go right. The Gini 
criterion (Eq. [1]) is evaluated with the weighted values, and the split that minimizes this 
value is chosen. We tested a variety of weight choices by fixing wi to be unity and varying 
Wh over a range of values. The results of this test are presented in §4.11 which demonstrates 
the effects of class weight choice on classifier performance. 



3.2. RATE GRB-2; : Random forest Automated Triage Estimator for GRB 

redshifts 

With the background above in hand, we now describe our resource allocation algorithm 
and its utility for the prioritization of high- 2; GRB follow-up. In our application, the data are 
described in §2] and the classes are high- and low-redshift GRBs, with 2; = 4 as the boundary 
between the classes. Our primary goal is to provide a decision for each new GRB: should 
we devote further resources to this event or not? This decision may be different for each 
astronomer, as it is dependent on the amount of follow-up time available. Implicit in this 
goal is the desire to follow up on as many truly high-redshift bursts as possible, under a set 
of given telescope time constraints. Directly using the results of an off-the-shelf classifier for 
this task (i.e., strictly following-up on events labeled as "high-priority") is suboptimal. If 
too few events are labeled as high-priority, there would be an under-utilization of available 
resources. If too many are being labeled as high priority, simply following up on the first 
ones available would preclude any prioritization of events within this high-priority class. 

These issues can be avoided by instead tailoring the follow-up decision to the resources 
available (in this case, the available telescope time devoted to high-z GRB observations). 
The RATE method works as follows: Let Q be the fraction of events one has resources to 
follow up or0. First we construct a Random Forest classifier using the training data with 



As telescope resources arc allocated by number of hours and not number of objects, we implicitly assume 
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known response (in this case redshift). We compute the probabihty of each training event 
being high-priority using out-of-bag probabihties (See §3.ip . For each new event, we obtain 
a probabihty of it being high priority using the Random Forest classifier, and compute the 
fraction of training bursts that received a higher probabihty of being high-priority than this 
new burst. A new burst is assigned rank n, with n — 1 training events having a lower 
probability of being high priority. Then, for total training bursts, we obtain a learned 
probability rank for the new event of Q := n/{N + 1). This leads to a simple decision metric 
for each new event: If Q is less than the desired fraction of events a particular observer 
wishes to follow up {Q < Q), follow-up observations are recommended. For instance, if one 
can afford to follow up on ~ 30% of all observable GRBs, then the desired follow-up fraction 
is Q = 0.3, and follow-up would be recommended for all events assigned a Q < 0.3. An 
illustration of this process in action is shown in Figure [31 The desired fraction of follow-up 
events Q can be dynamically changed without penalty; if the amount of available resources 
changes, one simply needs to raise or lower this cut-off value accordingly. 



Validation of Classifier Performance 



Our training data consist of 135 bursts, 18 of which are high-redshift (> 4). Our 
primary measure of performance is efficiency, defined here as the fraction of high bursts 
that we that we follow up on relative to the number of total high-z GRBs that occurred 
(Ahigh observed /Atotai high)- A sccoudary performance measure is purity, the number of foHowed- 
up events that were actually h igh- 2: (TVhigh n hserveH/A"tntai observed)- We measure performance 
using 10- fold cross-validation (lKohavilll995l ). where 90% of the data is used to construct a 
classifier and predict on the remaining 10% of events. Each line in the following performance 
plots is the cross- validated performance averaged across 100 trials of 10-fold cross-validation 
in order to reduce variability due to randomness in training/test subset selection. 



4.1. Comparison of Weight Choices 

As described in §3.1.21 one of the primary challenges in learning on this dataset is the 
simple fact that there are comparatively few high- 2; events on which to train. If simply 
getting the most classifications correct were the primary performance metric, as it is in 



here that an equal amount of resource time will be allocated to each follow-up event. This is not in general 
the case, as objects that turn out to be particularly interesting may have additional resources spent on them. 
However, a user's estimate of Q can always be adjusted without penalty as available resources change. 
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RATE GRB-z Method 

• New GRB is discovered; its attributes are fed into RF classifier 

• Classifier gives new event P(high-x) = 0.147 

• This probability is compared to the out-of-bag probabilities from 
the training set (Sec. 3.1). It falls between training events ranked #30 

and #31, and is thus given rank ^ = 31. 

_ New GRB 

. The GRB is assigned Q=^/(iV+ 1)^31/136 = 0.23 P = 0.147 

• Astronomer A can follow-up on 25% of events (Qa=0.25) 
and thus decides to observe this GRB (Q < Qa). 

• Astronomer B can follow-up on 20% of events (Qb=0.20) and thus 
decides not to observe this GRB (Q > Qb)- 



iV=135 
Training Events 



Rank 


P (high-z) 


1 


0.660 






29 


0.166 


30 


0.150 


^ 

31 


0.144 


32 


0.142 






135 


0.000 



Fig. 3. — Example of the RATE GRB-2; process. 



many classification problems, classifying all new events as low-redshift would be considered 
a strong classifier since so few events are in the high- 2; class. However, since our objective is 
to identify the best candidates of this rare class, we punish misclassifications of high-z GRBs 
more heavily to achieve higher efficiency and purity (outlined above) for a given fraction of 
foUowed-up events. 

Thus, in selecting the best weight for our classifier, we compared the efficiency and 
purity of high-z classification for various choices of the weight Wh using the feature set shown 
in Table [H While the relative probability ranking of the GRBs stayed relatively stable over 
weight choices (Figure H]), a clear trend emerges when comparing classification performance 
(Figure E]). As expected, punishing misclassifications of the smaller, more desirable high- 
z class cause more of these rare events to be correctly identified. Beyond a weight of 10, 
however, a ceiling is reached where further weight increases show zero change in classification 
performance. This is therefore the weight chosen for all subsequent performance comparisons. 
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log(wh) 

Fig. 4. — Bumps plot showing the cross-vahdated ranking prediction Q for each GRB in the 
training set over a variety of weight choices. Each hne corresponds to an individual GRB, 
colored by its observed redshift. Bursts with z > A are plotted with a thicker line. The 
clustering of high-^ events towards low Q is clear, illustrating the predictive power of the 
classifier. The relative ranking of events remains largely stable over different penalization 
weights, but performance improvements at higher weights are apparent in Figure \5\ which 
level off after a weight of 10. 



4.2. Effects of Feature Selection 



As mentioned in ^ early testing indicated that the addition of too many features rapidly 
degraded the predictive power of the final classifier. This is due t o a manifesta tion of the 
so-called "curse of dimensionality" known as Hughes Phenomenon (jHugheslll968l ). where for 
a fixed number of training instances, the predictive power decreases as the dimensionality 
increases. This appears to contradict the conventional wisdom that Random Forest does 
not overfit, and thus it is better to use many features. However, we note that resistance 
to overfitting is different from signal being drowned in noise. With enough noisy features. 




Fig. 5. — The effects of different weights on classifier performance are shown via plots of 
efficiency (iVhigh observed /^^totai high; left panel) and purity (A^high observed /A^totai observed; right 
panel) versus fraction of GRBs followed-up (see Figure [TT] for how our decision criterion 
Q corresponds to actual fraction followed-up). Solid black lines show expected results if 
selecting events by random guessing alone. Cross validated performances of the classifier 
trained with different weights are shown. Weights above 1.0 penalize misclassifications of 
high-z events more strongly, and vice versa. Efficiency and purity were calculated at each 
fraction of followed-up GRBs {Q, broken down into = 135 bins) and averaged over 100 
random number generator seeds to account for variance between Random Forest runs. Clear 
performance increases for both metrics are shown for higher weights, but beyond a weight 
of 10, identical results are achieved. For clarity, estimates of uncertainties in the curves are 
not shown, but are of order those plotted in Figure [3 

correlations between class and a useless feature will happen purely by chance, preventing 
true relationships from being found. 

To visualize this effect for our data, we took our nominal feature set and continually 
added features with no predictive power (random samples from the uniform distribution) to 
quantify the degradation in performance of the resultant classifiers. The random features 
were re-generated for each of the 100 trials, and the cross-validated results are shown in 
Figure El The fact that even a small number of useless features causes a noticeable decrease 
in performance highlights the importance of attribute selection. However, we note that too 
much fine tuning of attribute feature selection choices — such as testing all combinations of 




Fig. 6. — The effects of the addition of useless features on classifier performance are shown via 
plots of efficiency (A^high observed /A^totai high; left panel) and purity (iVhigh observed /A^totai observed; 
right panel) versus fraction of GRBs followed up according to our decision criterion (Q). 
Solid black lines show expected results if selecting events by random guessing alone. Cross 
validated performances of the classifier trained with different amounts of useless, randomly 
generated features are shown. Degradation in both efficiency and purity becomes clear with 
the addition of only a few useless features, highlighting the importance of feature selection 
for small, imbalanced datasets such as this one. 

features and seeing which one gives the best performance — would overfit to the data and 
give an underestimate of the true error. 



4.3. Final Classifier 

Taking into account the above issues of multiple feature set choices, the deleterious effect 
of useless features, and the performance with various weight choices to help with imbalance, 
we have developed a classifier which we believe to be robust and powerful. The full feature 
set utilized is shown in Table [H and the weight chosen is described in §4.11 The final cross- 
validated estimates of Q for the training data are shown alongside the corresponding redshifts 
in Table [2l By referencing a particular point on the x-axis of Figure [7] (left panel) one can 
determine what fraction of high bursts can be detected for a particular amount of telescope 
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follow-up time. For example, if we are able to follow up on 20% of all GRBs detected by 
Swift, then the bursts recommended for follow-up by our classifier will contain on average 
56% ± 6% of all GRBs with redshift greater than 4 that occur. Following-up on ~ 40% of 
all bursts will yield 84% ± 6% of all GRBs with redshift greater than 4, and following-up 
on the top 50% of candidates will result in nearly all of the high- 2; events being observed 
(96% ±4%). 

Purity is shown in the right panel of Figure [TJ which describes how many of the foUowed- 
up bursts will actually be high- redshift. Following up on 20% of all bursts would result in 
37% ± 4% of the followed-up events being high-redshift, and 28% ± 2% of followed up bursts 
would be high-redshift if 40% of GRBs were followed-up on. 

As the high/low class division of z = 4 was relatively arbitrary, for completeness we 
also re-trained the classifier and calculated performance results using cutoff values oi z = 3.5 
(Fig. [8]) and z = 3 (Fig. [9]). Note that while the sample size of 'high' events more than dou- 
bles by lowering the cutoff value to 2: = 3, the resultant efficiency decreases significantly. We 
attribute this effect to a decrease in the predictive power of certain attributes at lower red- 
shift. For instance, the z > 3 population has proportionally many more instances of UVOT 
detections in its 'high-2:' class than the z > 4 population, which reduces its effectiveness as 
a discriminating feature. 

4.4. Feature Importance 

There are several complications in identifying the relative importance of features in 
contributing to selecting high-2; candidates. To an extent, simple scatter plots such as those 
in Figure [2] can give an indication as to what features are best at separating the classes, but 
these fail to account for the complex interactions between features occurring within the RF 
classification. The effects of removing features from the dataset and then re-constructing the 
classifier give another indication of feature importance, but fail to account for redundancy 
in the features; if two features have similar predictive properties, removing one will just 
cause the other to take its place. Nevertheless, such an experiment can be illustrative, and 
the results are shown in Figure [TOl In general, the removal of an individual feature does 
not cause a significant change in performance, and the small changes that do occur trend 
toward a degradation in the number of high-2; bursts identified, implying that few if any of 
the features in the dataset are useless. The features that cause the largest degradation in 
performance upon their removal are a, Rpeak,BAT, and S'/A^max, indicating that these features 
are both useful predictors and are not fully redundant with other features. Note that the 
slight improvement in performance from the removal of the temporal features T90 and t^AT 
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Fig. 7.— Efficiency (A^high observed /A^totai high; left panel) and purity 

(-^high observed /^totai observed i right panel) versus fraction of GRBs followed up accord- 
ing to our decision criterion (Q)with a high- 2; cutoff oi z = 4. 18 bursts (~ 13% of our 
training set) are z > 4.0. The curve uncertainties shown are la standard deviations from 
the mean value across all seeds. 



is consistent with these values having l i ttle-to -no predictive power, in agreement with the 
recent findings of iKocevski fc Petrosiaru (1201 if ) showing a lack of time dilation signatures in 
GRB light curves. 



5. Discussion 

5.1. Calibration on GRBs with unknown redshifts 

A natural application of our methodology is to use it to predict the follow-up metric Q 
for the remaining majority of long-duration Swift GRBs with no known redshift, providing 
a list of the top candidates predicted to be high- 2;. This application is precisely how RATE 
GRB-2; could be used in practice on new events, albeit one-at-a-time rather than on many at 
once. We caution that due to the natural selection effect of GRBs with measured redshifts 
having a higher likelihood of being brighter events, the bursts with unknown redshifts are 
likely to comprise a somewhat different redshift distribution than our training dataset. The 
primary consequence of this is the interpretation of the user-desired follow-up fraction Q 
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Fig. 8.— Efficiency (A^high observed /A^totai high; left panel) and purity 

(-^high observed /-^totai observed! right panel) versus fraction of GRBs followed up accord- 
ing to our decision criterion (Q)with a high- 2; cutoff of z = 3.5. 26 bursts (~ 19% of our 
training set) are z > 3.5. The curve uncertainties shown are la standard deviations from 
the mean value across all seeds. 

and the prioritization parameter Q. In principle, the classifier was calibrated such that, over 
time, a fraction Q of new events will have affirmative follow-up recommendations (that is, 
events such that Q < Q). However, this will not necessarily be the case if the full redshift 
distribution of GRBs makes up a different population than our training data. 

To test this, we calculated Q for each of the remaining 212 GRBs with unknown redshift 
that met our culling criteria outlined in ^ From this we could calculate the fraction of 
GRBs followed up (Q < Q) for each cutoff value of Q. The results of this test are shown 
in Figure [TT] For the chosen weight of 10 (see §4.ip . the Q- values are well calibrated with 
the final follow-up recommendations. The resultant Q priorities are listed in Table IH These 
values can be interpreted as a ranking of which of these past events without secure redshift 
determinations are most likely to be at high-redshift. 
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Fig. 9.— Efficiency (A^'high observed /A^totai high; left panel) and purity 

(-^high observed /-^totai observed^ right panel) versus fraction of GRBs followed up accord- 
ing to our decision criterion (Q)with a high-2; cutoff of z = 3.0. 40 bursts (~ 30% of our 
training set) are z > 3.0. The curve uncertainties shown are la standard deviations from 
the mean value across all seeds. 
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Fig. 10. — Change in efficiency (left) and purity (right) by removing individual features from 
the default feature set listed in Table [1] The standard deviation from the mean value across 
all seeds for the default dataset is shown in grey. The lack of degradation in performance 
by the removal of a feature does not necessarily imply that it has no predictive power, only 
that it may be redundant with other features. Most of the features do not cause a significant 
change in performance once removed from the dataset. However, the removal of a few of 
the individual features does cause a degradation in performance larger than what would be 
expected by random, implying that these features are both important and not completely 
redundant. Note that the relative change in both purity and efficiency are equal in both 
plots, as only the numerator of each metric is changing (A^'high observed), but we show both 
values for consistency. 



Table 4. Test Data 



GRB 


Q 


a 




S 


S/ Nmax 


Nh.pc 


Tgo 


0"SAT 


Rpeak,BAT 


Rate 


tBAT 


UVOT 


P.>4 








(kcV) 


(erg/cm^) 




(1022 cm-2) 


(s) 




(ct/s) 


trigger 


(s) 


deteet 




050215A 


3.19C-01 


-1.29C+00 


4.14C+02 


1.34e-06 


1.02e+01 


7 


6.65e+01 


9.00C+00 


6.94e+02 


yes 


8.19e+00 


no 


9.81e-02 


050215B 


1.78e-01 


7 


3.01C+01 


2.86e-07 


1.44e+01 


5.70e-02 


8.50e+00 


8.00C+00 


3.00e+02 


yes 


2.05e+00 


no 


1.06e-01 


050219A 


4.22O-01 


1.87C-02 


l.OOe+02 


4.91C-06 


5.08e+01 


9.10C-02 


2.50C+01 


8.00e+00 


1.93C+02 


yes 


1.02C+00 


no 


1.12e-01 


050219B 


7.33C-01 


-8.94e-01 


1.12e+02 


1.94C-05 


7.19C+01 


8.80C-02 


2.09C+01 


1.70C+01 


4.09e+02 


yes 


1.02C+00 


no 


2.73e-02 


050326 


7.04e-01 


-1.04e+00 


3.41e+02 


1.70e-05 


1.33C+02 


3.80e-02 


3.02e+01 


2.10e+01 


1.84e+04 


yes 


5.12e-01 


no 


5.67e-02 



Note. — 



Table|4]is published in its entirety in the electronic edition of The Astrophysical Journal. A portion is shown here for guidance regarding its form and content. 
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Q calibration on training set 
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Q calibration on GRBs with unknown z 




Q 



Fig. 11. — Here we quantify the calibration of Q; namely, how well does the user-desired 
follow-up fraction Q correspond to the actual number of bursts recommended to be followed 
up by the algorithm {Q < Q). The left figure shows the self-calibration of the cross- validated 
training set, which aligns as expected. The right plot shows the calibration on the test set 
is good, especially at low Q. At larger Q, there is a slight departure from the diagonal, 
implying a follow-up recommendation of more events than expected at these values. This can 
be attributed to the differing populations between the training set (with measured redshifts) 
and test set (with unknown redshifts), as illustrated in Figure |2l This slight discrepancy 
is not surprising, as low brightness events without UVOT detections are naturally more 
difficult to obtain redshifts for. 



5.2. Validation Set: Application to Recent GRBs 

Since the cutoff date in our training set (June 21, 2010) until Sept. 1, 2011, there 
have been 15 long duration Swift GRBs with reliable redshifts from which we constructed 
an independent validation set to test our methocj^. The feature values for these GRBs are 
presented in Table |5l While none of these events were over our high-redshift cutoff value of 
z = 4, it is still possible, though challenging, to use low- 2; events (either by direct redshift 
measurement or by the identification of a coincident blue host galaxy) as a consistency 



^^Onc of the bursts with a measured redshift, GRB 110328A, had very unusual prope rties and was deter- 
mined to be a potential Tidal Disruption Event (jBloom et al.ll201ll : iLevan et al.ll2011bl ). and was thus also 
excluded from the validation set. 
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test. We would expect that the purity at a given Q would be lower than the fraction of 
recommended follow-up events {Q < Q) without a secure low- 2; determination. For instance, 
Q = 0.2 has a purity of 37% ± 4%, so no more than ~ 63% of events with Q < 0.2 should 
be definitively low-redshift. 

The validation GRBs were run through the RATE GRB-z classifier, and their resultant Q 
values are shown in Table [6] along with their corresponding redshifts. The smallest Q value 
of these events is ~ 0.3, meaning that none of these events would have been recommended 
for high- 2; follow-up for anyone wishing to observe fewer than 30% of events. While these 
values are certainly consistent with our expected purity, it is not particularly constraining, 
as it would have been very unlikely for this almost-random selection of GRBs to violate this 
constraint by chance alone, even if the classifier had no predictive power. 

A more constraining test is the identification of high-z events with high Q for comparison 
with the expected efficiency. Two events not included in our training set have had recent 
high-2 identifications: G RB 090429B with strong photometric evidence for being 2 ~ 9.4 
( Cucchiara et al.| 2011b), and the spectro scopic identification of GRB 111008A at z = 4.99 



( iLevan et al.ll2011at IWiersema et al .1120 111 ). The former has a Q value of ~ 0.185, consistent 
with the expected efficiency. However, GRB 111008A has a Q of ~ 0.637, a value above 
which we would have expected to find no more than 1% of high-z events. This outlier seems 
likely due to the extreme brightness of the event (among the brightest ~ 10% of Swift bursts 
in the observer frame, and top ~ 3% in the rest frame). Indeed, compared to all 18 high-z 
events in the training set, GRB 111008A has the most extreme values towards the 'wrong' 
end of three of the highly important features identified in §4.41 (a, Pz>4,, and Rpeak,BAT) and 
also has the fourth largest S/Nmax- In later iterations of RATE GRB-2;, this event (and all new 
GRBs with secure redshifts) will be added to the training data to re-generate the classifier 
and further improve its robustness against such outliers. 



Table 5. Validation Data 



GRB 


Q 




a 


^peak 


S 


S / N-max 








Rpeak,BAT 


Rato 




UVOT 


Pz>i 










(keV) 


(erg/cm^) 






(1022 cm-2) 


(s) 




(ct/s) 


triggor 


{^) 


dotoct 




100728B 


6.07e-01 


-1, 


.64C+00 


8.19e+01 


2.54O-06 


2, 


.060+01 


3.900-02 


1.15O+01 


9.07O+00 


1.47e+02 


yes 


1.02e+00 


yes 


l.Olc-01 


100814A 


6.81C-01 


-1, 


.llc+00 


1.35C+02 


9.33C-06 


9, 


.8O0+OI 


? 


1. 770+02 


I.9I0+OI 


8.34e+02 


yes 


1.02e+00 


yes 


1.80e-01 


100816A 


9.33e-01 


-5, 


.710-01 


1.42e+02 


2.7I0-O6 


5, 


.8O0+OI 


1.13O-01 


2.50e+00 


2.29O+01 


1.42e+03 


yes 


1.02e+00 


yes 


5.55e-02 


100901A 


4.00C-01 


-1, 


.550+00 


1.28e+02 


3.41e-06 


1, 


.780+01 


4.OO0-O2 


4.59O+02 


7.7O0+OO 


4.5O0+O2 


yes 


8.19e+00 


yes 


2.25e-01 


100906A 


l.OOc+00 


-1, 


.660+00 


1.57e+02 


1.37e-05 


1, 


.360+02 


7 


1.17O+02 


1.05O+01 


I.9I0+O2 


yes 


5.12e-01 


yes 


7.39e-02 


101219B 


6.30e-01 


-1, 


.890+00 


4.97e+01 


3.75e-06 


1, 


.OOo+Ol 


-8.OO0-O3 


4.I80+OI 


7.63O+00 


8.44O+02 


no 


6.40e+01 


yes 


1.07e-01 


110205A 


3.19C-01 


-1, 


.390+00 


9.75e+01 


1.98e-05 


1, 


.500+02 


I.IO0-O2 


2. 770+02 


l.OOo+Ol 


1.48O+03 


no 


6.40e+01 


yes 


1.45e-01 


110213A 


9.33C-01 


-1, 


.820+00 


6.70e+01 


8.77e-06 


3, 


.lOo+Ol 


4.OO0-O2 


4.31e+01 


I.2I0+OI 


2.05e+02 


yes 


1.02e+00 


yes 


5.32e-02 


110422A 


l.OOc+00 


-6, 


.230-01 


l.lle+02 


5.17O-05 


2, 


.100+02 


1.58O-01 


2.67e+01 


7.19O+00 


8.20e+01 


yes 


1.28e-01 


yes 


2.49e-02 


110503A 


9.33C-01 


-8, 


.I80-OI 


1.42e+02 


1.43O-05 


6, 


.270+01 


2.6O0-O2 


9.31e+00 


2.04O+01 


1.26e+03 


yes 


1.02C+00 


yes 


1.89C-02 


110715A 


9.33C-01 


-1, 


.060+00 


8.94e+01 


1. 400-05 


2, 


.020+02 


1.64O-01 


I.3I0+OI 


1.19O+01 


1. 470+02 


yes 


1.28e-01 


yes 


9.70e-03 


110726A 


5.04e-01 


-2, 


.970-01 


4.27e+01 


2.07O-07 


1, 


.5I0+OI 


-4.9O0-O2 


5.4O0+OO 


8.6O0+OO 


2.24O+02 


yes 


1.02e+00 


yes 


1.14e-01 


110731A 


l.OOe+00 


-1, 


.190+00 


4.06e+02 


1.25O-05 


1, 


.300+02 


7.2O0-O2 


4.660+01 


2.46O+01 


2.32O+03 


yes 


1.02e+00 


yes 


5.09e-02 


110801A 


9.33C-01 


-1, 


.840+00 


6.07e+01 


6.85O-06 


3, 


.560+01 


2.9O0-O2 


4.OO0+O2 


7.83O+00 


3.5O0+O2 


yes 


4.10e+00 


yes 


1.98e-01 


110808A 


5.56e-01 


? 




2.59e+01 


4.27O-07 


1, 


.Olo+Ol 


2.17O-01 


3.94e+01 


7.19e+00 


4.26e+02 


yes 


8.19e+00 


yes 


1.06C-01 
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5.3. Comparison to Previous Efforts 



Extracting indications of redsliift from promptly available information has been a con- 
tinuing goal of GRB studies since their cosmological origins were discovered nearly 15 years 
ago. Several potential luminosity indicators were pursued with the optimistic goal of using 
GRBs as standard candles for cosmological studies. The efficacy of individual indicators 
toward this goal proved to be limited, and a physical origin of the relations has been con- 
tested, with authors attributing thern instead to detector thresholdi ng or other selection 
effects JSutler et all bopj . bood boioi Ishahmoradi fc Nemirofj boilh . While these stud- 
ies have ruled out the majority of such relations as intrinsic to GRBs themselves, prompt 
properties can still be used as redshift indicators if the systematics are properly accounted 
for. 

Several recent studies have attempted to use comb inations of feat u res to determine 
"pseudo-red shifts" for GRBs. In an extension of work by ISchaeferl ( 120071 ). IXiao &: Schaefer 



(120091 . l201l[ ) used a combination of six purported luminosity relations. Further, iKoeru ( 12009 



201oh has explored li near regression as a tool for predicting GRB redshifts using the dataset 
from ISchaeferl (120071 ). As data derived from multiple satellites were used, these studies are 
particularly vulnerable to the detector selection effects mentioned above. 

Some works avoided the complications of regression an d instead focused upon the sim- 



ple selection of high-z candidates for follow-up purposes. iCampana et all (120071 ) utilized 
a sample of Swift-onlj bursts (thus avoiding detector effect biases) and used hard cuts on 
three feat ures (Tc,n, lack of UVOT detection, and high-galactic latitude) for high-z candidate 



selection. ISalvaterra et al.l (120071 ) extended upon this work with the additional feature of 



peak photon flux. 

Several issues prevent a direct comparison among the various methods of the effective- 
ness at separating high- 2; events. These include the usage of different features from each 
study, which is complicated by the lack of uniformity of features being created for each. Fur- 
ther, the techniques above strictly constrain the manner in which each feature influences the 
output, whereas our method is fully non-parametric and therefore more flexible. However, 
the largest concern is accurate reporting of predictive performance. In particular, we caution 
against the circular practice of measuring the performance of methods by applying them to 
the same events from which the luminosity relations were formed. In order to prevent over- 
estimating the accuracy of a predictive model, one needs to test on data independent from 
the training set, such as with cross-validation. 

Finally, the RATE method differs from previous efforts in that it casts the problem as 
one of optimal resource allocation under limited follow-up time. Prior techniques are not 
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explicitly calibrated to suit this purpose. Direct classification methods will either under or 
over-utilize available resources. Past regression or "pseudo-z" methods are not explicitly 
calibrated to a particular follow-up decision (i.e., at what "pseudo-2;" does one decide to 
follow up?), though it would be possible in principle to correct for this using a transforma- 
tion which ensures that the desired follow-up fraction corresponds to the actual fraction of 
bursts followed up (e.g.. Figure [TTj) . In contrast, the RATE technique is by design applicable 
to any available resource reserves, and is generally extendable to any transient follow-up 
prioritization problem. 



6. Conclusions 

In this paper, we presented the RATE GRB-2; method for allocating follow-up telescope 
resources to high-redshift GRB candidates using Random Forest classification on early-time 
Swift metrics. The RATE method is generalizable to any prioritization problem that can 
be parameterized as "observe" or "don't observe", and accommodates statistical challenges 
such as small datasets, imbalanced classes, and missing feature values. The issue of resource 
allocation is becoming increasingly important in the era of data-driven transient surveys such 
as PTF, Pan-STARRS, and LSST which provide extremely high discovery rates without a 
significant increase in follow-up resources. With enough training instances of any object of 
interest for a given transient survey, the RATE method can be applied to prioritize follow-up 
of future high-priority candidates. 

In the RATE GRB-z application, our robust, cross- validated performance metrics indicate 
that by observing just 20% of bursts, one can capture 56% ±6% of ^ > 4 events with a sample 
purity of 37% ±4%. Further, following up on half of all events will yield nearly all (96% ±4%) 
of the high- 2; events. The method provides a simple decision point for each new event: if the 
prioritization value Q is smaller than the percent of events a user wishes to allocate resources 
to, then follow-up is recommended. These rapid predictions, combined with the more tradi- 
tional photometric dropout technique from simultaneous multi-filter NIR observatories (such 
as PAIRITEL, GROND, and the upcoming RATIR), offer a robust tool in more efficiently 
informing GRB follow-up decisions. To facilitate the dissemination of high-redshift GRB 



predictions to the community, we have set up a website (http://rate.grbz.info) with 



Q values for past bursts, and an RSS feed (http: //rate .grbz . inf o/rss .xmll) to provide 



real-time results from our classifier on new events. 
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